%background
In this section, we provide a basic introduction to the benefits and
limitations of the steep-slope devices we consider, and
describe how we extrapolate from these effects to processor-level
models.

\subsection{Basic background on Steep-slope devices}
\emph{Steep slope devices} have been proposed as an alternative to
counter the 60~mv/decade subthreshold limitation that restricts the
scaling of conventional CMOS transistors. This property of CMOS causes
an exponential increase in leakage current as supply voltage
($V_{dd}$) scales to near- and sub-threshold values and threshold
voltage ($V_T$) remains roughly constant.  
In contrast, steep-slope devices do not suffer from this sub-threshold limitation 
on account of their charge transport mechanism which involves tunneling 
through the intrinsic region. 
Hence, at near-threshold and subthreshold voltages, these steep slope devices have the
potential to outperform CMOS devices to several orders of magnitude.

Heterojunction Tunnel FETs (TFETs) are one of the most promising step slope devices, both in terms
of subthreshold operation and high speed switching operations, and
TFET devices have been shown to scale well into future process nodes~\cite{Lu-tfet-scaling}.  
%Logic and memory circuits have been designed using TFET devices and TFET processors are projected to be in production by 2020.
There have been experimental demonstrations for III-V n-FET and SiGE p-FET co-integration in~\cite{czornomaz-iedm13} and for vertical III-V TFET integration on Si substrate in~\cite{rooyackers-iedm13} and~\cite{tomioka-iedm13}, which makes heterogeneous integration of TFET and CMOS devices possible on the same die.
TFETs do however, suffer in some aspects when
compared to CMOS technology.  As the supply voltage is increased, the
inherent limitation in the TFET charge-carrying mechanism causes the
current to saturate above a certain operating voltage.  Due to the
saturation of the tunneling current, the switching delay remains
constant beyond a certain supply voltage.  At a processor level, this
translates to a limited range of operating frequencies for TFET
transistors.

It is possible to tune TFET device characteristics to an extent to
improve frequency and power responses through altering channel
length~\cite{iedm12}.  By using a low static power (LSTP) TFET with an
increased channel length, our TCAD~\cite{tcad-sentaurus} simulations
show that we will able to realize the drive current required while
drastically reducing overall power consumption by 2$\times$ over existing
device models~\cite{codes12}. 


\subsection{Extrapolation to processor model}
We use
McPAT~\cite{mcpat} to obtain the total processor power for an
equivalent FinFET transistor based technology.  We obtain the
corresponding TFET core power by scaling this obtained value with
transistor level delay and power parameters.  

Such a simple scaling, does not, however, take into account the
interconnect delay and power in the processor. Assuming the entire
processor to consist solely of transistors subject to CMOS-TFET
scaling would result in substantial inaccuracies in the model.
Figure~\ref{fig:wire-power-delay} a) and b) shows the proportion of
wire power and delay to the total core power and critical path delay
respectively, for both CMOS and TFET processors of different issue
width configurations.  In the lower power TFET processors, the
increase in wire power with architectural complexity is non-uniform,
even though wire lengths increase monotonically with issue width.
This is because the leakage power increases non-linearly with respect
to the wire power, modifying the relative ratios of dynamic, leakage
and wire power.  Since these wire parameters are relatively invariant
to core frequency, the proportion of wire power is highest at low
frequencies (minimum logic power) and the proportion of wire delay is
highest at high frequency (minimum logic delay) points.  Hence we plot
Figure~\ref{fig:wire-power-delay} a) and b) at these corresponding
extreme points (500~MHz and 2~GHz respectively for a CMOS processor
and 500~MHz and 1.5~GHz for a TFET processor).  We observe that the
contribution of wire delay to the total critical path delay can be
significant (up to 30\% for CMOS and TFET processor). In a similar
manner, the wire power also constitutes nearly 30\% of the total
processor power in CMOS and nearly 50\% of the total power in TFET
processors.


\begin{figure}[ht!]
  \centering
    \epsfig{file=figs/wire_power_delay.eps, angle=0, width=0.9\linewidth, clip=}
   \caption{\label{fig:wire-power-delay} a) and b) Fraction of wire power and wire delay to total core power and delay respectively.}
\end{figure}

While incorporating wire models into our device-to-processor
abstraction, we assumed a direct device substitution from Si FinFET to
TFET technology at 22~nm.  As a result, we scaled all transistor delays
and power for logic and memory components.  For memory components,
TFET designs have additional structural differences due to their
unidirectional conduction. The wire models largely remain the same
across technologies, except for the buffer and repeater logic in large
wires, which we rescaled according to relative drive strengths.
%and McPAT models.

%  This was included in our calculations by using a
% modified version of the Elmore wire delay RC model.

Figure~\ref{fig:crossover} shows the power-frequency tradeoffs for a
2-issue Atom-like core and a 4-issue Ivybridge-like core, when both
are realized using CMOS and TFET technology.  We define ``crossover
frequency'' as the operating frequency (and associated voltage) where
the TFET and CMOS based designs provide equivalent energy
efficiency. The crossover frequency ($f_c$) point is lower for the
Ivybridge core compared to the Atom core.  This is because the timing
constraints for the complex core (in McPAT) are more severe and the
slack allowed to the slower TFET transistor is relatively lower.

%figs all crossover plots
\begin{figure}[ht!]
  \centering
    \epsfig{file=figs/crossover.eps, angle=0, width=1\linewidth, clip=}
    \caption{\label{fig:crossover} Variation of CMOS and TFET core power with frequency for a simple (Atom-like) core and a complex (Ivybridge-like) core. The crossover point is seen to shift to the left as core complexity increases. }
\end{figure}


\begin{comment}
Figs:
\begin{itemize}
\item LSTP v/s LOP model
\item Technology scaling trends in CMOS and TFET (Avci ?)
\end{itemize}

Figs:
\begin{itemize}
\item Breakdown of wire and logic power (delay) for different processor configurations
\item Crossover figs with and without wire models.
\end{itemize}
\end{comment}


% LocalWords:  TFET CMOS subthreshold TFETs FinFET
